Modular on-line function approximation for scaling up reinforcement learning

نویسنده

  • Chen-Khong Tham
چکیده

Reinforcement l e a r n i n g i s a p o werful learning paradigm for autonomous agents which i n teract with unknown environments with the objective of maximizing cumulative p a yoo. Recent research has addressed issues concerning the scaling up of reinforcement learning methods in order to solve problems with large state spaces, composite tasks and tasks involving non-Markovian situations. In this dissertation, I extend existing ways of scaling up reinforcement learning methods and propose several new approaches. An array of Cerebellar Model Articulation Controller (CMAC) networks is used as fast function approximators so that the evaluation function and policy can be learnt on-line as the agent i n teracts with the environment. Learning systems which combine reinforcement learning techniques with CMAC networks are developed to solve problems with large state and action spaces. Actions can be either discrete or real-valued. The problem of generating a sequence of torque or position change commands in order to drive a simulated multi-linked manipulator towards desired arm conngurations is examined. A hierarchical and modular function approximation architecture using CMAC n e t works is then developed, following the Hierarchical Mixtures of Experts framework. The non-linear function approximation ability o f C M A C n e t works enables non-linear functions to be modelled in expert and gating networks, while permitting fast linear learning rules to be used. An on-line gradient ascent learning procedure derived from the Expectation Maximization algorithm is proposed, enabling faster learning to be achieved. The new architecture can be used to enable reinforcement learning agents to acquire context-dependent evaluation functions and policies. This is demonstrated in an implementation of the Compositional Q-Learning framework in which composite tasks consisting of several elemental tasks are decomposed using reinforcement learning. The framework is extended to the case where rewards can be received in non-terminal states of elemental tasks, and tòvector of actions' situations where the agent produces several coordinated actions in order to achieve a goal. The resulting system is employed to enable the simulated multi-linked manipulator to position its end-eeector at several positions in the workspace sequentially. Finally, the beneets of using prior knowledge in order to extend the capabilities of reinforcement learning agents are examined. A classiier system-based Q-learning scheme is developed to enable agents to reason using condition-action rules. The utility o f t h i s s c heme is illustrated in a …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Minimax-Based Reinforcement Learning with State Aggregation - Decision and Control, 1998. Proceedings of the 37th IEEE Conference on

One of the most important issues in scaling up reinforcement learning for practical problems is how to represent and store cost-to-go functions with more compact representations than lookup tables . In this paper , we address the issue of combining the simple function approximation method-state aggregation with minimaxbased reinforcement learning algorithms and present the convergence theory fo...

متن کامل

Convergent Tree-Backup and Retrace with Function Approximation

Off-policy learning is key to scaling up reinforcement learning as it allows to learn about a target policy from the experience generated by a different behavior policy. Unfortunately, it has been challenging to combine off-policy learning with function approximation and multi-step bootstrapping in a way that leads to both stable and efficient algorithms. In this paper, we show that the Tree Ba...

متن کامل

Sparse Approximations to Value Functions in Reinforcement Learning

We present a novel sparsification and value function approximation method for on-line reinforcement learning in continuous state and action spaces. Our approach is based on the kernel least squares temporal difference learning algorithm. We derive a recursive version and enhance the algorithm with a new sparsification mechanism based on the topology obtained from proximity graphs. The sparsific...

متن کامل

Novel Feature Selection and Kernel-Based Value Approximation Method for Reinforcement Learning

We present a novel sparsification and value function approximation method for on-line reinforcement learning in continuous state and action spaces. Our approach is based on the kernel least squares temporal difference learning algorithm. We derive a recursive version and enhance the algorithm with a new sparsification mechanism based on the topology maps represented by proximity graphs. The spa...

متن کامل

Reinforcement Algorithms Using Functional Approximation for Generalization and their Application to Cart Centering and Fractal Compression

We address the conflict between identification and control or alternatively, the conflict between exploration and exploitation, within the framework of reinforcement learning. Qlearning has recently become a popular offpolicy reinforcement learning method. The conflict between exploration and exploitation slows down Q-learning algorithms; their performance does not scale up and degrades rapidly...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994